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Abstract. A new model of causal failure is presented and used to solve 
a novel replica placement problem in data centers. The model describes 
dependencies among system components as a directed graph. A replica 
placement is defined as a subset of vertices in such a graph. A criterion for 
optimizing replica placements is formalized and explained. In this work, 
the optimization goal is to avoid choosing placements in which a single 
failure event is likely to wipe out multiple replicas. Using this criterion, 
a fast algorithm is given for the scenario in which the dependency model 
is a tree. The main contribution of the paper is an 0(n + p 2 ) dynamic 
programming algorithm for placing p replicas on a tree with n vertices. 
This algorithm exhibits the interesting property that only two subprob¬ 
lems need to be recursively considered at each stage. An 0(n 2 p) greedy 
algorithm is also briefly reported. 


1 Introduction 


With the surge towards the cloud, our websites, services and data are increas¬ 
ingly being hosted by third-party data centers. These data centers are often 
contractually obligated to ensure that data is rarely, if ever unavailable. One 
cause of unavailability is co-occurring component failures, which can result in 
outages that can affect millions of websites 13 , and can cost millions of dollars 


in profits 11 . An extensive one-year study of availability in Google’s cloud stor¬ 


age infrastructure showed that such failures are relatively harmful. Their study 
emphasizes that “correlation among node failure dwarfs all other contributions 
to unavailability in our production environment” [4|. 

We believe that the correlation found among failure events arises due to 
dependencies among system components. Much effort has been made in the 
literature to produce quality statistical models of this correlation. But in using 
such models researchers do not make use of the fact that these dependencies can 
be explicitly modeled, since they are known to the system designers. In contrast, 
we propose a model wherein such dependencies are included, and demonstrate 
how an algorithm may make use of this information to optimize placement of 
data replicas within the data center. 

To achieve high availability, data centers typically store multiple replicas of 
data to tolerate the potential failure of system components. This gives rise to a 




placement problem , which, broadly speaking, involves determining which subset 
of nodes in the system should store a copy of a given file so as to maximize a 
given objective function ( e.g ., reliability, communication cost, response time, or 
access time). While our focus is on replica placements, we note that our model 
could also be used to place replicas of other system entities which require high- 
availability, such as virtual machines and mission-critical tasks. 

In this work, we present a new model for causal dependencies among failures, 
and a novel algorithm for optimal replica placement in our model. An example 
model is given as Fig. [l] in which three identical replicas of the same block of 
data are distributed on servers in a data center. Each server receives power from 
a surge protector which is located on each server rack. In Scenario I, each replica 
is located on nodes which share the same rack. In Scenario II, each replica is 
located on separate racks. As can be seen from the diagram of Scenario I (Fig. 
E§, a failure in the power supply unit (PSU) on a single rack could result 
in a situation where every replica of a data block is completely unavailable, 
whereas in Scenario II, (Fig. |l(b)[ ) three PSUs would need to fail in order to 
achieve the same result. In practice, Scenario I is avoided by ensuring that each 
replica is placed on nodes which lie on separate racks. This heuristic is already 
part of known best-practices. Our observation is that this simple heuristic can 
be suboptimal under certain conditions. For example, consider a failure in the 
aggregation switch which services multiple racks. Such a failure could impact the 
availability of every data replica stored on the rack. Moreover, this toy example 
only represents a small fraction of the number of events that could be modeled 
in a large data center. 

While many approaches for replica placement have been proposed, our ap¬ 
proach of modeling causal dependencies among failure events appears to be new. 
Other work on reliability in storage area networks has focused on objectives 
such as mean time to data loss [3,7 . These exemplify an approach towards cor¬ 
related failure which we term “measure-and-conquer”. In measure-and-conquer 
approaches, a measured degree of correlation is given as a parameter to the 
model. In contrast, we model explicit causal relations among failure events which 
we believe give rise to the correlation seen in practice. In |7] the authors consider 
high-availability replica placement, but are primarily focused on modeling the 
effects of repair time. Later work begins to take into account information 
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Fig. 1: Two scenarios represented by directed trees. 
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concerning the network topology, which is a step towards our approach. Similar 
nreasure-and-conquer approaches are taken in lj|4j[8 14 . More recently, Pezoa 
and Hayat 10 have presented a model in which spatially correlated failures 


are explicitly modeled. However, they consider the problem of task allocation, 
whereas we are focused on replica placement. In the databases community, work 
on replica placement primarily focuses on finding optimal placements in storage- 
area networks with regard to a particular distributed access model or mutual 
exclusion protocol [5 12,15 . In general, much of the work from this community 
focuses on specialized communication networks and minimizing communication 
costs — system models and goals which are substantially different from our own. 

Recently, there has been a surge of interest in computer science concerning 
cascading failure in networks [2l|6l|9 16 . While our model is most closely related 


to this work, the existing literature is primarily concerned with applications 
involving large graphs intended to capture the structure of the world-wide web, 
or power grids. The essence of all these models is captured in the threshold 
cascade model |j2]. This model consists of a directed graph in which each node v is 
associated with a threshold, t{v) £ N + . A node v experiences a cascading failure 
if at least £(v) of its incoming neighbors have failed. This model generalizes our 
own, wherein we pessimistically assume that £(v) = 1 for all nodes v. Current 
work in this area is focused on network design [2], exploring new models 6,9 , 
and developing techniques for adversarial analysis 161. To our knowledge, no 
one has yet considered the problem of replica placement in such models. 


2 Model 

We model dependencies among failure events as a directed graph, where nodes 
represent failure events, and a directed edge from u to v indicates that the 
occurrence of failure event u could trigger the occurrence of failure event v. We 
refer to this graph as the failure model 

Given such a graph as input, we consider the problem of selecting nodes 
on which to store data replicas. Roughly, we define a placement problem as 
the problem of selecting a subset of these vertices, hereafter referred to as a 
placement , from the failure model so as to satisfy some safety criterion. In our 
application, only those vertices which represent storage servers are candidates to 
be part of a placement. We refer to such vertices as placement candidates. Note 
that the graph also contains vertices representing other types of failure events, 
which may correspond to real-world hardware unsuitable for storage (such as 
a ToR switch), or even to abstract events which have no associated physical 
component. In most applications, the set of placement candidates forms a subset 
of the set of vertices. 

More formally, let E denote the set of failure events, and C denote the set of 
placement candidates. We are interested in finding a placement of size p , which 
is defined to be a set ?CC, with |P| = p. Throughout this paper we will use P 
to denote a placement, and p to denote its size. We consistently use C to denote 
the set of placement candidates, and E to denote the set of failure events. 
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Let G = (V", A) be a directed graph with vertices in V and edges in A. The 
vertices represent both events in E and candidates in C, so let V = E U C. A 
directed edge between events e\ and e^ indicates that the occurrence of failure 
event ei can trigger the occurrence of failure event e 2 . A directed edge between 
event e and candidate c indicates that the occurrence of event e could com¬ 
promise candidate c. We will assume failure to act transitively. That is, if a 
failure event occurs, all failure events reachable from it in G also occur. This a 
pessimistic assumption which leads to a conservative interpretation of failure. 
We now define the notions of failure number and failure aggregate. 

Definition 1. Let e £ E. The failure number of event e, denoted /(e, P), for a 
given placement P, is defined as the number of candidates in P whose correct 
operation could be compromised by occurrence of event e. In particular, 

/(e, P) = \{p £ P \ p is reachable from e in G}|. 

As an example, node u in Fig. [I] has failure number 3 in Scenario I, and failure 
number 1 in Scenario II. The following property is an easy consequence of the 
above definition. A formal proof can be found in the appendix. 

Property 1. For any placement P of replicas in tree T, if node i has descendant 
3, then f(j,P) < f{i, P). 

The failure number captures a conservative criterion for a safe placement. 
Intuitively, we consider the worst case scenario, in which every candidate which 
could fail due to an occurring event does fail. Our goal is to find a placement 
which does not induce large failure numbers in any event. To aggregate this idea 
across all events, we define failure aggregate , a measure that accounts for the 
failure number of every event in the model. 

Definition 2. The failure aggregate of a placement P is a vector in N p+1 , de¬ 
noted f{P), where f(P) := ( p p , , and eachpi := | {e £ E | /(e, P) = *} | ■ 

In Fig. [l] node v has failure aggregate (2, 0,0,1) in Scenario I and failure aggre¬ 
gate (1,0, 2,0) in Scenario II. Failure aggregate is also computed in Fig. [6] 

In all of the problems considered in this paper, we are interested in opti¬ 
mizing f(P). When optimizing a vector quantity, we must choose a meaningful 
way to totally order the vectors. In the context of our problem, we find that 
ordering the vectors with regard to the lexicographic order is both meaningful 
and convenient. The lexicographic order <l between /(P) = ( p p , ...,pi,po) and 
f(P') = ( p'p , ■■;Pi,p'o) is defined via the following formula: 

f{P) —L f(P') 3 m > 0, V i > m[pi = p' A p m < p' m ]. 

To see why this is desirable, consider a placement P which lexicominimizes f(P) 
among all possible placements. Such a placement is guaranteed to minimize p p , 
i.e. the number of nodes which compromise all of the entities in our place¬ 
ment. Further, among all solutions minimizing p pi P also minimizes p p ~i, the 
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number of nodes compromising all but one of the entities in P , and so on for 
Pp- 2 iVp- 3 i ■■■iPo- Clearly, the lexicographic order nicely prioritizes minimizing 
the entries of the vector in an appealing manner. 

Throughout the paper, any time a vector quantity is maximized or minimized, 
we are referring to the maximum or minimum value in the lexicographic order. 
We will also use f(P) to denote the failure aggregate, and pi to refer to the i th 
component of /(P), where P can be inferred from context. 

In the most general case, we could consider the following problem. 

Problem 1. Given graph G = ( V , A) with V = CU£, and positive integer p with 
p < |C|, find a placement P C C with |P| = p such that f(P) is lexicominimum. 

Problem [l] is NP-hard to solve, even in the case where G is a bipartite graph. In 
particular, a reduction to independent set can be shown. However, the problem is 
tractable for special classes of graphs, one of which is the case wherein the graph 
forms a directed, rooted tree with leaf set L and C = L. Our main contribution 
in this paper is a fast algorithm for solving Problem [l] in such a case. We briefly 
mention a greedy algorithm which solves the problem on 0(n 2 p) time. However, 
since n p in practice our result of an 0(n + p 2 ) algorithm is much preferred. 

2.1 An 0(n 2 p) Greedy Algorithm 

The greedy solution to this problem forms a partial placement P', to which new 
replicas are added one at a time, until p replicas have been placed overall. P' 
starts out empty, and at each step, the leaf u which lexicominimizes f(P' U {u}) 
is added to P'. This greedy algorithm correctly computes an optimal placement, 
however its running time is 0(n 2 p) for a tree of unbounded degree. This running 
time comes about since each iteration requires visiting 0(\L\) leaves for inclusion. 
For each leaf q which is checked, every node on a path from q to the root must 
have its failure number computed. Both the length of a leaf-root path and the 
number of leaves can be bounded by 0(n ) in the worst case, yielding the result. 

That the greedy algorithm works correctly is not immediately obvious. It can 
be shown via an exchange argument that each partial placement found by the 
greedy algorithm is a subset of some optimal placement. This is the content of 
Theorem 1 below. 

To establish the correctness of the greedy algorithm, we first introduce some 
notation. For a placement P and SC V, let /(S', P) = ( g p , g P -i,gi, go) where 
gi := |{x £ S | f(x,P) = *}|. Intuitively, f(S,P) gives the failure aggregate for 
all nodes in set S C V. We first establish the truth of two technical lemmas 
before stating and proving Theorem |Tj 

Lemma 1. Let r be the root of a failure model given by a tree. Given P C C, 
a,b eC - P. If f(r a, P) < L f(r b,P) then /(PU {a}) < L f(PU{b}). 

Proof. Suppose f(r a,P) <l f(r b,P). Let nodes on the paths from r to 

a and from r to b be labeled as follows: 


v —y cl\ —y a 2 — y ... —y d n —y d 
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v —y b\ —y 62 —^ ^ b m —y b 

We proceed in two cases. 

In the first case, there is some 1 < i < min( to, n) for which f(a,i : P ) < 
f(bi,P). Let i be the minimum such index, and let /(5j,P) = k. Clearly, 
f(PU{a})k < f(PU{b})k, since PU{b} counts bi as having survival number k 
and P U {a} does not. Moreover, since /(a^, P ) = f(be, P) for all i < i, we have 
that for all j > k, f(P U {a})j = f(P U {b})j by Property[l] 

In the second case, /(a,;, P) > f(bi,P) for all 1 < i < min(m, n). In this case, 
if f(<n,P) > f(bi,P) for some i, the only way we could have f{r a, P) <l 
f(r b, P) is if there is some j > i with f(a,j,P) < f(bj,P), but this is a con¬ 
tradiction. Therefore, /(aj,P) = f(bi,P ) for all 1 < i < min(m,n). So, we must 
also have n < m, since if n > to, we would have f(r a,P) >l f(r b,P). 
Moreover, since f(r a,P) <l fix 6, P), we must have that n < m, for 
if n = to, we would have f(r a,P) = f( r b,P), a contradiction. We 
have just shown the existence of some node b n + 1 , for which we must have that 
f(b n+ i,P) < f(a n , P). Notice that the path r a does not have an (n + l) st 
node, so it’s clear that if f(b n+ i,P) = k, then /(P U {a})fc < /(P U {&})&■ Fi¬ 
nally, since n < m, we have by Property [l] that /(aj,P) < f{a n ,P) < k for all 
1 < i < n. By an additional application of Property [l] it’s easy to see that for 
all j > k, we have /(P U {a})j = f(P U {6})j. □ 

From Lemma [lj we obtain the following result as an easy Corollary. 

Corollary 1. Let r be the root of a failure model given by a tree. Given P C C, 

a, b € C — P. Then f(r a, P) <l f{r b, P ) if and only if f(P U {a}) 

f(PG{b}). 

Proof. Suppose f(r a, P) <l f(r b, P). If f(r a, P) = /(r b, P), then 
since the only nodes which change failure number when considering placements 
P and P U {a} are those on the paths r a, and each of these nodes’ failure 
numbers increase by 1, we must have that /(P U {a}) = /(P U {6}), since the 
sequence of failure numbers in r a and r b are the same. If f(r a, P) <l 
f(r b, P) then by Lemma [l] the Corollary is proven. 

If instead /(P U {a}) < L f(P U {b}), and yet f(r a, P) > L f(r 6, P), 
then by Lemma[l]we obtain that /(PU{a}) >l /(PU{6})> a contradiction. □ 

Given a node u in a tree, let L(u) be the set of all leaves which are descendants 
of u. 

Lemma 2. Given P C C, a,b G C. Let c be the least common ancestor of a and 

b, and let d be the child of c on the path from c to a. If f{r a, P) <l f(r b, P ) 
and X C C — {a, 6} for which L(d) fl X = 0, and a,b (f X then 

f(PUXU{a}) < L f(PUXU{b}). 

Proof. We have that f(r a, P) <l f(r b , P). Consider f(r a, PLCY) and 
f(r &, P U X). We wish to show that f(r ^ a,Pll X) <£ /(r 6, P U X). 
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Since c is the least common ancestor of a and b , it is clear that nodes onr^c 
have equivalent failure numbers in both cases. Therefore it suffices to show that 
f(c ~*a,P U X) < L f(c b, P U X). 

Note that since d fl L{X) = 0, we have that /(c ^a,PUl) = /(c a, P). 
Moreover, since the addition of nodes in X cannot cause failure numbers on the 
path c b to decrease, we must have that /(c b, P) <l f(c b, P U X). 
Altogether, we have that 

f(c a, P U X) = /(c —> a, P) <l /(c 6, P) < L /(c ^i,PUX). 

By applying Corollary [lj we obtain that /(P U X U {a}) <l /(P U X U {6}). □ 



Fig. 2: Named nodes used in Theorem [l] The arrow labeled “swap” illustrates 
the leaf nodes between which replicas are moved, and is not an edge of the graph. 


Theorem 1. Let Pi be the partial placement from step i of the greedy algorithm. 
Then there exists an optimal placement P* , with |P*| = p such that Pi C P*. 

Proof. The proof proceeds by induction on j. Pq = 0 is clearly a subset of any 
optimal solution. Given Pi C P* for some optimal solution P*, we must show 
that there is an optimal solution Q* for which p +1 C Q*. Clearly, if P^+i C P*, 
then we are done, since P* is optimal. In the case where Pi+i 2 P* we must 
exhibit some optimal solution Q* for which P^+i C Q*. Let u be the leaf which 
was added to Pi to form P i+1 . Let v be the leaf in P* — Pi+i which has the 
greatest-depth least common ancestor with u, where the depth of a node is 
given by its distance from the root (see Fig. [2]). We set Q* = (P* — {?;}) U {u}, 
and claim that f(Q*) <l f(P*)- Since f(P*) is optimal, and P,; + i C Q* this 
will complete our proof. 

Clearly, /(a u, Pi) <l f(a v,Pi), since otherwise /(r u,Pi) >l 
f(r v,Pi), implying that f(Pi U {u}) >l f(Pi U {z>}), contradicting our use 
of a greedy algorithm. 

Note that u,v ^ (P* — Pi — {u}). Moreover, by choice of v, we have that 
L(a) fl (P* — Pi — {u}) = 0, since the only nodes from P* in L(a) must also be 
in Pi. To complete the proof, we apply Lemma [2] setting X = P* — P, — {u}. 
This choice of X is made so as to yield the following equalities. 

Q* = (p* - M) U {«} = Pi u (p* - p % - {u}) u M, 
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Fig. 3: Round-robin placement cannot Fig. 4: Nodes used in Theorem [ 2 J 
guarantee optimality 


P* = P Z U (p* - Pi - M) u {«}. 

By Lemma [2j we obtain inequality in the following formula, 
f(Q*) = f(Pi U (P* - Pi~ M) U {u}) < L f(Pi U (P* -Pi- M) U {u}) = 
Thereby completing the proof. □ 

3 Balanced Placements 

Consider a round-robin placement in which the set of replicas placed at each 
node is distributed among its children, one replica per child, until all replicas 
have been placed. This process is then continued recursively at the children. 
Throughout the process, no child is given more replicas than its subtree has leaf 
nodes. This method has intuitive appeal, but it does not compute an optimal 
placement exactly as can be seen from Fig. [3] Let placements Pi and P 2 consist of 
the nodes labeled by 1 and 2 in Fig. [^respectively. Note that both outcomes are 
round-robin placements. A quick computation reveals that /(Pi) = (1,1, 7,0) 7 ^ 
(1, 3, 3, 2) = /(P 2 ). Since the placements have different failure aggregates, round- 
robin placement alone cannot guarantee optimality. 

Key to our algorithm is the observation that any placement which lexico- 
minimizes f(P) must be balanced. If we imagine each child Ci of u as a bin of 
capacity £i, balanced nodes are those in which all unfilled children are approx¬ 
imately “level”, and no child is filled while children of smaller capacity remain 
unfilled. These ideas are formalized in the following definitions. 

Definition 3. Let node u have children indexed 1,..., k, and let the subtree rooted 
at the i th child of node u have £i leaves, and ri replicas placed on it in placement 
P. A node for which ti — ri = 0 is said to be filled. A node for which £i — Vi > 0 
is said to be unfilled. 

Definition 4. Node u is said to be balanced in placement P iff: 

ti-n> 0 => Vj e { 1 , k} (n > rj - 1 ). 


Placement P is said to be balanced if all nodes v £ V are balanced. 
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Fig. 5: Placements Pi,P 2 Fig. 6: Failure numbers for Pi (right) and P 2 (left). 


To motivate a proof that lexico-minimum placements must be balanced, con¬ 
sider Fig. [5] in which P\ and P 2 are sets containing leaf nodes labeled 1 and 2 
respectively. Fig. [ 6 ] presents two copies of the same tree, but with failure num¬ 
bers labeled according to P\ and P 2 . Upon computing /(Pi) and /(P 2 ), we find 
that /(Pi) = (2,1,3, 7) >l (1,1,4, 7) = /(P 2 ). Note that for placement Pi, the 
root of the tree is unbalanced, therefore P\ is unbalanced. Note also, that P 2 
is balanced, since each of its nodes are balanced. We invite the reader to verify 
that P 2 is an optimal solution for this tree. 

Our main result is that it is necessary for an optimal placement to be bal¬ 
anced. However, the balanced property alone is not sufficient to guarantee op¬ 
timality. To see this, consider the two placements in Fig. [3] By definition, both 
placements are balanced, yet they have different failure aggregates. Therefore, 
balancing alone is insufficient to guarantee optimality. Despite this, we can use 
Theorem [2] to justify discarding unbalanced solutions as suboptimal. We exploit 
this property of optimal placements in our algorithm. 

Theorem 2. Any placement P in which f(P) is lexicominimum among all 
placements for a given tree must he balanced. 

Proof. Suppose P is not balanced, yet /(P) is lexicominimum among all place¬ 
ments P. We proceed to a contradiction, as follows. 

Let u be an unbalanced node in T. Let v be an unfilled child of it, and let w 
be a child of u with at least one replica such that r v < r w — 1. Since v is unfilled, 
we can take one of the replicas placed on w and place it on v. Let q w be the leaf 
node from which this replica is taken, and let q v be the leaf node on which this 
replica is placed (see Fig. [4]). Let P* := (P — {(?,„}) U {g„}. We aim to show that 
P* is more optimal than P, contradicting P as a lexicominimum. 

Let f(P) := (p P ,—,Po), and f(P*) := (p*,...,Pq). For convenience, we let 
f(w, P) = m. To show that f(P*) <l f(P), we aim to prove that < p m: and 
that for any k with p > k > m, that p* k = pk- We will concentrate on proving 
the former, and afterwards show that the latter follows easily. 

To prove p ^ < p m , observe that as a result of the swap, some nodes change 
failure number. These nodes all lie on the paths v q v and w q w . Let S~ 
(resp. S + ) be the set of nodes whose failure numbers change to m (resp. change 
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from to), as a result of the swap. Formally, we define 

S~ := {x G V | f(x, P) = to, f(x, P*) ^ to}, 

S + := {i £ h | f(x, P) ^ to, /(z, i 3 *) = to}. 

By definition, p = p m — |S'”| + |5' + |. We claim that \S~\ > 1 and \S + \ = 0, 
which yields < p m . To show IS 1 ”! > 1, note that f(w, P) = m by definition, 
and after the swap, the failure number of w changes. Therefore, | S' - 1 > 1. 

To show |S H "| = 0, we must prove that no node whose failure number is 
affected by the swap has failure number to after the swap has occured. We 
choose to show a stronger result, that all such node’s failure number must be 
strictly less than to. Let s v be an arbitrary node on the path v q v , and consider 
the failure number of s v . As a result of the swap, one more replica is counted 
as failed in each node on this path, therefore f(s v ,P*) = f(s v ,P) + 1. Likewise, 
let s w be an arbitrary node on path w q w . One less replica is counted as 

failed in each node on this path, so f(s w , P*) = f(s w ,P) — 1. We will show that 

f(s w ,P*) < to, and f(s v ,P*) < to. 

First, note that for any s w , by Property[l]/(s 1 „, P*) < f(w , P*) = to—1 < to. 
Therefore, f(s w ,P*) < to, as desired. 

To show f(s v , P*) < to, note that by supposition r w — 1 > r v , and from this 
we immediately obtain f(w, P) — 1 > f(v, P) by the definition of failure number. 
Now consider the nodes s v , for which 

f(s v , P) < f{v, P) < f(w , P) - 1 = to - 1 => f(s v , P*) - 1 < m - 1, 

Where the first inequality is an application of Property [l} and the implication 
follows by substitution. Therefore f{s v ,P*) < to as desired. 

Therefore, among all nodes in P* whose failure numbers change as a result 
of the swap, no node has failure number to, so |5 + | = 0 as claimed. Moreover, 
since f(s,P*) < to for any node s whose failure number changes as a result of 
the swap, we also have proven that pk = p% f° r all k where p > k > m. This 
completes the proof. □ 

4 An 0(np ) Algorithm 

Our algorithm considers only placements which are balanced. To place p replicas, 
we start by placing p replicas at the root of the tree, and then proceed to assign 
these replicas to children of the root. We then recursively carry out the same 
procedure on each of the children. 

Before the recursive procedure begins, we obtain values of ii at each node by 
running breadth-first search as a preprocessing phase. The recursive procedure is 
then executed in two consecutive phases. During the divide phase, the algorithm 
is tasked with allocating r(u) replicas placed on node u to the children of u. After 
the divide phase, some child nodes are filled, while others remain unfilled. To 
achieve balance, each unfilled child Ci will have either r{ct) or r(ci) — 1 replicas 
placed upon them. The value of r{ci) is computed for each c, as part of the divide 
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phase. The algorithm is then recursively called on each unfilled node to obtain 
values of optimal solutions for their subtrees. Nodes which are filled require 
no further processing. The output of this call is a pair of two optimal failure 
aggregates, one supposing r(ci ) replicas are placed at Cj, the other supposing 
r{ci) — 1 are placed. Given these failure aggregates obtained from each child, 
the conquer phase then chooses whether to place r(ci) or r(ci) — 1 replicas on 
each unfilled child so as to achieve a lexicominimum failure aggregate for node 
u overall. For ease of exposition, we describe an 0(np ) version of our algorithm 
in this section, and prove it correct. In Section [5] then discuss improvements 
which can be used to obtain an 0(n + p 2 ) algorithm. Finally, we describe some 
tree transformations which can be used to obtain an 0(n + plogp) algorithm in 
Section [(>] 

4.1 Divide Phase 

When node u is first considered, it receives at most two possible values for the 
number of replicas it could be asked to accommodate. Let these be the values 
r{u) and r{u ) — 1. Let u have a list of children indexed 1,2, ...,m, with leaf 
capacities where 1 < i < m. The divide phase determines which children will 
be filled and which will be unfilled. Filled children will have £i replicas placed 
on them in the optimal solution, while the number of replicas on the unfilled 
children is determined during the conquer phase. 

The set of unfilled children can be determined (without sorting) in an iter¬ 
ative manner using an 0(m) time algorithm similar to that for the Fractional 
Knapsack problem. The main idea of the algorithm is as follows: in each iter¬ 
ation, at least one-half of the children whose status is currently unknown are 
assigned a filled/unfilled status. To determine which half, the median capacity 
child (with capacity £ me d) is found using the selection algorithm. Based upon 
the number of replicas that have not been assigned to the filled nodes, either 
a) the set of children a with ti > l me d are labeled as “unfilled” or b) the set 
of children Ci with < £ m ed are labeled as “filled”. The algorithm recurses on 
the remaining unlabeled children. Pseudocode for this algorithm can be found 
in Algorithm [I] 

We briefly sketch the correctness of Algorithm 1. The following invariant 
holds after every execution of the while loop: 

max(F) • (\U\ + \M\) < r - ^ ij < min(f7) • \U\ + ^ £*. 

aeF Ci£M 

When U = 0 or F = 0 the invariant is not well-defined. These conditions are 
easy to test for: U = 0 if and only if = r ( u )i an d -F = 0 if and only if 
ii > L]m[J f° r all *• Hence in what follows, we will work only with cases where 
U ^ 0 and F / 0. At the end of the algorithm, M = 0, and the invariant reduces 
to the following 

max(F) < -< min(Z7). (1) 
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u 


unfilled children 


Algorithm 1: Determines filled and unfilled nodes 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 
11 
12 

13 

14 


Function Get-Filled(M, r)begin 

F £- 0 ; t/ 4— 0 ; II F := filled children 
while M 0 do 

tmed 4— median capacity of children in M ; 

Ml 4— {a G M | < imed} ; 

M24— {a G M \li = £ med } ■ 

Ms 4- {Ci G M I ti > Imed} ; 

x 4— r — S c . 6fuM] UM , ; II x to be distributed among M 3 UU 
if -t ^med 


i6FUMiUM 2 ' 

(|f/| + |M 31 ) then 


F 4 
M ■ 


FUMiU M 2 ; 

- M - (Mi U M 2 ) 


else 


t/ 4 
M ■ 


FU M 2 U M 3 ■ 

- M - (M 2 U M 3 ) 


// MiU M 2 guaranteed filled 


// M 2 U M 3 guaranteed unfilled 
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return (!F, U) ; 


// return filled and unfilled children 


Equation [l] indicates that the average number of replicas placed on the unfilled 
nodes lies between the maximum value of F and the minimum value of U. From 
this, it is easy to see that the labeling is correct. Suppose that some filled child 
Ci G F has been incorrectly classified. This child contains at most £i — 1 replicas, 
and yet is still unfilled. Moreover, to attain the average, some unfilled child must 

be assigned at least [ — | replicas. Taking the difference of the number of 

replicas assigned to these two unfilled nodes, we have 


> 


> 




\U\ 


■-E 


a£F 


\U\ 


fr-E 


P- 

cieF ^ 


\U\ 


(■i + 1 

max(F) + 1 
ma x(F) + 2 > 2 


which is a violation of the balanced placement property. Therefore, all replicas 
are correctly classified. This completes the proof sketch. 

Suppose we know that we only need to find placements of size r(u) and 
r{u) — 1 for node u. Moreover, we know that in an optimal placement of size 
r(u), each child Ci only needs to accomodate either r(ct ) or r(ci) — 1 replicas. 
Suppose that optimal placements of size r(cj) and r(cj) — 1 are available at 
each child Cj. Theorem [3] shows that these placements are all that is required to 
compute optimal placements of size r(u ) and also of size r(u) — 1 . 
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Theorem 3. In any case where r(u) orr(u) — 1 replicas must be balanced among 
k unfilled children, it suffices to consider placing either \ ^4—^] or 1 j 

replicas at each unfilled child. 

Proof. Let s := r(u) — L. Suppose s mod k = 0. If s replicas are placed at u, 
then all unfilled children receive exactly ^ (= [~|]) replicas. If s — 1 replicas 
are placed at u, one child gets f — 1 = L^rJ replicas. If instead s mod k > 0, 
then the average number of replicas on each unfilled child is ^ ^ Z. To attain 
this average using integer values, values both above and below % are needed. 
However, since the unfilled children must be balanced, whatever values selected 
must have absolute difference at most 1. The only two integer values satisfying 
these requirements are and [fj- But |_fj = L^irJ w ^en s mod k > 0. □ 

4.2 Conquer Phase 

Once the recursive call completes, we combine the results from each of the chil¬ 
dren to achieve the lexicographic minimum overall. Our task in this phase is 
to select ( r(u ) — L) mod k unfilled children on which \ ' L ] replicas will be 

placed, and place [ r ^ k L —-J replicas on the remaining unfilled children. We need 
to do this in such a way that the resulting placement is lexicominimum. Recall 
also that we must return two values, one for r(u) and another for r(u) — 1. We 
show how to obtain a solution in the r(u) — 1 case using a greedy algorithm. A 
solution for r(u ) can easily be obtained thereafter. In this section, when two vec¬ 
tors are compared or summed, we are implicitly making use of an O(p) function 
for comparing two vectors of length p in lexicographic order. 

Let cij (respectively bf) represent the lexicominimum value of f{P) where 
P is any placement of [ ' (respectively \ r< ' u \fi L ]) replicas on child i. 
Recall that a,,6j £ N p+1 , and are available as the result of the recursive call. 
We solve the optimization problem by encoding the decision to take 6j over a,; as 
a decision variable Xi £ {0,1}, for which either Xi = 0 if is selected, or Xi = 1 
if hi is selected. The problem can then be described as an assignment of values 
to Xi according to the following system of constraints, in which all arithmetic 
operations are performed point-wise. 

min^^a, + (6; — ajxj, subj. to: x^ = ( r(u ) — L ) mod k. (2) 

i i 

An assignment of xi which satisfies the requirements in ([2]) can be found by 
computing 6; — a; for all i, and greedily assigning Xi = 1 to those i which have 
the ( r{u) — L) mod k smallest values of — This is formally stated as 

Theorem 4. Let tt := (7Ti, 7r2, ..., TTk) be a permutation of {1, 2,..., k} such that: 

b^i ^"7Ti filL bft 2 CL 7r2 f-L ... J. b nk O'TTk ■ 

If vector x = (x\, ...,Xk) is defined according to the following rules: set x^ i = 1 
iff i < (r(it) — L) mod k, else x Wi = 0, then x is an optimal solution to 0 - 
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The following Lemma greatly simplifies the proof of Theorem |4j 

Lemma 3. (Z ra ,+) forms a linearly-ordered group under <l- In particular, for 
any x,y,z £ Z n ,x < L y ==> x + z < L y + z. 

A straight-forward proof of Lemma [3] can be found in the appendix. 

Proof (Proof of Theorem^. First, notice that a solution to © which minimizes 
the quantity — a i) x i also minimizes the quantity JT Gq + ( 6 , — af)Xi. It 

suffices to minimize the former quantity, which can be done by considering only 
those values of ( 6 , — af) for which Xi = 1. For convenience, we consider x to be 
the characteristic vector of a set S C {1, We show that no other set S' 

can yield a characteristic vector x' which is strictly better than x as follows. 

Let a := ( r(u ) — L) mod k, and let S := {tt\, ..., 7r a _i} be the first a — 1 
entries of 7r taken as a set. Suppose that there is some S' which represents a 
feasible assignment of variables to x' for which x' is a strictly better solution 
than x. S' C {1,..., k}, such that |5'| = a — 1, and S' S. Since S' ^ S, and 
\S'\ = |5| we have that S — S' / 0 and S' — S ^ 0. Let i £ S — S' and j £ S' — S. 
We claim that we can form a better placement, S* = (S' — {.?})U{*}. Specifically, 

^ ^ (be OLe) ^ ^ (bm ■ (3) 

£€S* m€S' 

which implies that replacing a single element in S' with one from S does not 
cause the quantity minimized in ([ 2 ]) to increase. 

To prove © note that j S and i £ S => ( 6 ; — Oj) <l (bj — aj). We now 
apply Lemmal3| setting x = (bi—af), y = ( bj—aj ), and z = ~~ a e) 

This yields 

y" (b e - a t ) + ( 6 * - cq) < L ^ (b e - a t ) + (bj - aj) . 
te(s*-{i}) *e(s*-{»}) 

But since <5* — {*} = S' — {j}, we have that 

Y. (be - a t ) + (be - a*) < L Y (b m - a m ) + (bj - a,j) . (4) 

Clearly, Q => (J3| , thereby proving ([3| . This shows that any solution which is 
not S can be modified to swap in one extra member of S without increasing the 
quantity minimized in ©• By induction, it is possible to include every element 
from S, until S itself is reached. Therefore, x is an optimal solution to ([2|. □ 

In the algorithm, we find an optimal solution to ([ 2 J) by assigning |~ - 
replicas to those children where i is such that 1 < i < ( r(u ) — L) mod k , and 
L J re pli cas t° those remaining. To do this, we find the unfilled child having 
the ((r(u) — L) mod k) th largest value of 6 , — Gq using linear-time selection, and 
use the partition procedure from quicksort to find those children having values 
below the selected child. This takes time 0(kp) at each node. 
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( 5 ) 


At the end of the conquer phase, we compute and return the suir0 

E + E a i + E f(Pj) + lr(«)-l) 

i < ( r(u) — L ) mod /c i > (r(n) —L) mod fc j : filled 

where Pj is the placement of replicas on child j and l r (n)-i is a vector of length 
p having a one in entry r{u) — 1 and zeroes everywhere else. The term l r („.)_i 
accounts for the failure number of it. This sum gives the value of an optimal 
placement of size r(it) — 1. Note there are k + 1 terms in the sum, each of which 
is a vector of length at most p+ 1. Both computing the sum and performing the 
selection take O(kp) time at each node, yielding 0{np) time overall. 

We have only focused upon computing the value of the optimal solution. The 
solution itself can be recovered easily by storing the decisions made during the 
conquer phase at each node, and then combining them to output an optimal 
placement. 


5 An 0(n + p 1 2 ) Algorithm 

An 0{n + p 2 ) running time can be achieved by an 0(n) divide phase, and an 
0(p 2 ) conquer phase. The divide phase already takes at most 0(n) time overall, 
so to achieve our goal, we concern ourselves with optimizing the conquer phase. 
The conquer phase can be improved upon by making two changes. First, we 
modify the vector representation used for return values. Second, we transform 
the structure of the tree to avoid pathological cases. 

In the remainder of the paper, we will use array notation to refer to entries 
of vectors. For a vector v, the k th entry of v is denoted v [k]. 


Compact Vector Representation Observe that the maximum failure number 
returned from child ct is r(ci). This along with Property[T]implies that the vector 
returned from Ci will have a zero in indices p, p— 1,..., r(ci) + 1. To avoid wasting 
space, we modify the algorithm to return vectors of length only r(c;). At each 
node, we then compute ([5]) by summing entries in increasing order of their index. 
Specifically, to compute v 1 -\-v 2 + ■■■ + v kl where each vector v :s has length r(cj), 
we first allocate an empty vector w , of size r(cj), to store the result of the sum. 
Then, for each vector Vj. we set «j[i] •<— u>[i] + Vj[i] for indices i from 0 up to 
r(ci). After all vectors have been processed, w = V\ + ... + v k - This algorithm 
takes r(ci) + ... + r(c k ) = 0(r(u)) time. Using smaller vectors also implies that 
the (( r(u ) — L ) mod k) th best child is found in 0(r(u)) time, since each unfilled 
child returns a vector of size at most and there are only k unfilled 

children to compare. With these modifications the conquer phase takes 0(7'(it)) 
time at node u. 

1 In the mentioned sum we assume for notational convenience, that the vectors have 
been indexed in increasing order of b; — a, , although the algorithm performs no such 

sorting. 
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Tree Transformations Note that for each i, nodes at depth i have 0{p) repli¬ 
cas placed on them in total. We can therefore achieve an 0(p 2 ) time conquer 
phase overall by ensuring that the conquer phase only needs to occur in at most 
O(p) levels of the tree. To do this, we observe that when r(u) = 1, any leaf with 
minimum depth forms an optimal placement. Recursive calls can therefore be 
stopped once r(u) = 1. To ensure that r{u) = 1 after O(p) levels, we contract 
paths on which all nodes have degree two into a single pseudonode during the 
preprocessing phase. The length of this contracted path is stored in the pseudon¬ 
ode, and is accounted for when computing the sum. This suffices to ensure r(u) 
decreases by at least one at each level, yielding an 0(n + p 2 ) algorithm. 

6 An 0(n + plogp) Algorithm 

In this section, we extend ideas about tree transformation from the last section to 
develop an algorithm in which the conquer phase only needs to occur in at most 
O(logp) levels. We achieve this by refining the tree transformations described in 
Section [5] 

To ensure that there are only O(logp) levels in the tree, we transform the 
tree so as to guarantee that as the conquer phase proceeds down the tree, r(u) 
decreases by at least a factor of two at each level. This happens automatically 
when there are two or more unfilled nodes at each node, since to balance the 
unfilled children, at most [ ' g L ] replicas will be placed on each of them. Prob¬ 
lems can therefore only arise when a tree has a path of nodes each of which have 
a single, unfilled child. We call such a path a degenerate chain. By detecting 
and contracting all such degenerate chains, we can achieve an O(plogp) conquer 
phase. 

Fig.[7R1 illustrates a degenerate chain. In this figure, each Tj with 1 < i < 
i — 1 is the set of all descendant nodes of Vi which are filled. Thus, iq, ...,Vt~i 
each have only a single unfilled child (since each v t has iq+i as an child). In 
contrast, node vt has at least two unfilled children. It is easy to see that if the 
number of leaves in each T) is 0(1) then t, the length of the chain, can be as 
large as O(p). This would imply that there can be 0{p) levels in the tree where 
the entire conquer phase is required. To remove degenerate chains, we contract 
nodes v\, i into a single pseudonode w, as in Fig. |7(bj| However, we must 
take care to ensure that the pair of vectors which pseudonode w returns takes 
into account contributions from the entire contracted structure. We will continue 
to use Vi and Tj throughout the remainder of this section to refer to nodes in a 
degenerate chain. 

To find and contract degenerate chains, we add an additional phase, the 
transform phase, which takes place between the divide and conquer phases. 
Recall that after the divide phase, the set of filled and unfilled children are 
available at each node. Finding nodes in a degenerate chain is therefore easily 
done via a breadth-first search. We next consider what information must be 
stored in the pseudonode, to ensure that correct results are maintained. 
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(a) A degenerate chain. 

Fig. 7: Illustration of a degenerate chain in which each Vi where 1 < i < t — 1 
represents a node which has a single unfilled child. All filled descendents of node 
Vi are collectively represented as Tj. In the figure on the right, nodes 
have been contracted into pseudonode w. 


Let (a w , b w ) be the pair of values which will be returned by pseudonode w at 
the end of the conquer phase. In order for the transformation to be correct, the 
vectors (a w , b w ) must be the same as those which would have been returned at 
node V\ had no transformation occurred. To ensure this, we must consider and 
include the contribution of each node in the set T\ U ... U T t _i U {v\, ..., v t -i}. It 
is easy to see that the failure numbers of nodes in {iq,..., Vt~ 1 } depend only upon 
whether r(vt) or r(vt) — 1 replicas are placed on node vt, while the filled nodes 
in sets Ti,...,T t _i have no such dependency. Observe that if r(vt) replicas are 
placed on v t , then r(vi) replicas are placed at each node Vi . If instead r(v t ) — 1 
replicas are placed, then r{vi) — 1 replicas are placed at each Vi. Since values 
of r(vi) are available at each node after the divide phase, enough information is 
present to contract the degenerate chain before the conquer phase is performed. 

The remainder of this section focuses on the technical details needed to sup¬ 
port our claim that the transform phase can be implemented in time 0[n + p log p ) 
overall. Let S w := TiU...UT t _iU{ui,..., and let the contibution of nodes in 

S w to a w and b w be given by vectors a and b respectively. The transform phase 
is then tasked with computing a and b, and contracting the degenerate chain. 
We will show that this can be done in time 0(15^1 +r(vi)) for each pseudonode 
w. 

Pseudocode for the transform phase is given in Algorithm [2j The transform 
phase is started at the root of the tree by invoking Transform (root, false, p). 
Transform is a modified recursive breadth-first search. As the recursion proceeds 
down the tree, each node is tested to see if it is part of a degenerate chain (lines 
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Algorithm 2: Transform phase 


1 

2 

3 

4 

5 


Function Transform(w, chain , r(v i))begin 
if u has two or more unfilled children then 
foreach child Ci unfilled do 

(— , —, —, x) •<— Transform (ci, false , _L) ; 
Ci <— x ; 


6 

7 


if chain = false then return (_L, _L, _L, tx) ; 
else return Or(ui)+i?ix) , 


8 

9 

10 

11 


if u has one unfilled child, v then 
if chain = false then 

// pass r(v) as max vector length 
( a,b,f,x ) <— Transform (i;, true, r(v)) ; 


12 

13 


else 

|_ (a,b,f,x) <— Transformer, true, r(vi)) ; 


14 

15 


foreach filled child Ci do 
|_ Filled (c;,/) ; 
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17 

18 

19 

20 


k E iti + r (v) -1; 
a[k -)- 1] ^— a^k -\- 1] -f- lj 
b[fe] «— b[k] + 1; 
if chain = false then 

x 4— Make-Pseudonode (a, b, /, x) 
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return (a, b, f, x) 


// 3 ■ 0(r(v i)) time 


// 0(ni) time 


[2] and [8]). If a node is not part of a degenerate chain, the call continues on all 
unfilled children (line [3]). The first node (t>i) in a degenerate chain is marked 


by passing down chain -f- true at lines 11 and 13 The value of r(v\) is also 


passed down to the bottom of the chain at lines nn and [13] Once the bottom of 
the chain (node vt) has been reached, the algorithm allocates memory for three 
vectors, a, b and /, each of size r(v i) + 1 (line [7). These vectors are then passed 
up through the entire degenerate chain (line [21 1, along with node u, whose use 
will be explained later. When a node it in a degenerate chain receives a, b, and 
/, u adds its contribution to each vector (lines |T~i][T8| ) . The contribution of node 
u consists of two parts. First, the contribution of the filled nodes is added to / 
by invoking a special Filled subroutine (see Algorithm [3]) which computes the 
sum of the failure aggregates of each filled child of u (lines |TT||T5 1. Note that 
Filled uses pass-by-reference semantics when passing in the value of /. The 
contribution of node u itself is then added, by summing the number of leaves in 
all of the filled children, and the number of replicas on the single unfilled child, 
v (lines Its][l8 ). By the time that the recursion reaches the start of the chain 
on the way back up (line 191, all nodes have added their contribution, and the 
pseudonode is created and returned (line [20| . 

The transformation takes place as Transform is returned back up the tree. At 
the end of the degenerate chain, node v t is returned (lines [6]{7j) , and this value is 
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Algorithm 3: Computes failure aggregate of filled nodes 

1 Function Filled(u, /)begin 

2 

else if u is a leaf then 

3 


m «- m + 1 ; 

4 


return; 

5 

foreach child a do 

6 

|_ Filled! Ci,/) 

7 

a 


8 

f[a] 4- f{a\ + 1 ; 

9 

return; 


passed along the length of the entire chain (line [2T]) , until reaching the beginning 
of the chain, where the pseudonode is created and returned (line 201. When the 
beginning of the chain is reached, the parent of V\ updates its reference (line [5]) 
to refer to the newly created pseudonode. At line [5] note that if Ci was not the 
beginning of a degenerate chain, x = Ci and the assignment has no effect (see 
lines 6]{7 ). 


We provide pseudocode for the Filled and Make-Pseudonode subroutines in 
Algorithms [3] and |4] The Make-Pseudonode subroutine runs in 0(1) time. It is 
easy to see that the Filled routine runs in O(rq) time, where Hi is the number of 
nodes in the subtree rooted at child C;. The Transform routine therefore takes 
0(|Tj|) time to process a single node i>j. The time needed for Transform to 
process an entire degenerate chain is therefore 0(|S' U ,|) + 3 • 0(r(v 1 )), where the 
3 • 0(r(vi)) term arises from allocating memory for vectors a, b and f at the 
last node of the chain. 


When we sum this time over all degenerate chains, we obtain a running time 
of 0(n+p log p) for the transform phase. To reach this result, we examine the sum 
of r(vi) for all pseudonodes at level i. Since there are at most p replicas at each 
level i, this sum can be at most O(p) in any level. There are only O(logp) levels 
where r(u) > 1 after degenerate chains have been contracted, thus, pseudonodes 
can be only be present in the first O(logp) levels of the tree. Therefore the 
3 • 0{r{v 1 )) term sums to 0(p\ogp) overall. Since |5u,| clearly sums to 0(n) 
overall, the transform phase takes at most 0(n + p\ogp) time. 

Finally, after the transformation has completed, we can ensure that the value 
of r[u) decreases by a factor of two at each level. This implies that there are only 
0(log p) levels where the conquer phase needs to be run in its entirety. Therefore, 
the conquer phase takes O(plogp) time overall. When combined with the 0{n ) 
divide phase and the 0(n + p log p) transform phase, this yields an 0{n + p\ogp) 
algorithm for solving replica placement in a tree. 
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Algorithm 4: Creates and returns a new pseudonode 


1 

2 

3 

4 

5 

6 


Function Make-Pseudonode (a, b, f, x) begin 

allocate a new node node ; 
node.a 4— a + f; 
node.b <— b + /; 
node.child x; 

return node 


7 Conclusion 

In this paper, we formulate the replica placement problem and show that it can 
be solved by a greedy algorithm in 0(n 2 p) time. In search of a faster algorithm, 
we prove that any optimal placement in a tree must be balanced. We then exploit 
this property to give a 0(np) algorithm for finding such an optimal placement. 
The running time of this algorithm is then improved, yielding an 0(n + plogp) 
algorithm. An interesting next step would consist of proving a lower bound for 
this problem, and seeing how our algorithm compares. In future work we plan to 
consider replica placement on additional classes of graphs, such as special cases 
of bipartite graphs. 

We would like to acknowledge insightful comments from S. Venkatesan and 
Balaji Raghavaclrari during meetings about results contained in this paper, as 
well as comments from Conner Davis on a draft version of this paper. 
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Appendix 

Proof of Property [l] 

The following property from Section[2]is an easy result regarding failure numbers 
which is used in the proofs of Theorems |T] and [5] 

Proof (Proof of Property [7]). Suppose that in P there are £' t replicas placed on 
leaves in the subtree rooted at i. If i fails, ki replicas fail, yielding s(i , P) = p—ki. 
Let kj be the number of replicas in the subtree rooted at j. Clearly, kj < ki , 
yielding the result. □ 


NP-Hardness of Problem [l] 

However, Problem [T] is NP-hard to solve exactly or approximately. In partic¬ 
ular, we can reduce the well-known problems of Independent Set (IS) and 
Dominating Set (DS) to Problem [I] The reduction from DS shows that min¬ 
imizing only the first entry of /(P), f r , is NP-hard, while the reduction from 
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IS shows that lexico-minimizing the vector down to the second-to-last entry, 
fi is NP-hard. That is to say, if it were possible to lexico-minimize the vector 
(f r ,..., / 2 , /i) in polynomial time, it would imply P = NP. 


Proof of Lemma [3] 

The proof of Theorem[4]is greatly simplified through use of an algebraic property 
of addition on h n under lexicographic order. Recall that a group is a pair (S , •), 
where S' is a set, and • is a binary operation which is 1) closed for S, 2) is 
associative, and has both 3) an identity and 4) inverses. A linearly-ordered group 
is a group G = (S, •), along with a linear-order < on S in which for all x,y,z £ S, 
x < y => x-z < y-z, i.e. the linear-order on G is translation-invariant. Lemma 
[3]states that Z" under <l has such a property. 

Proof (Proof of Lemma [?|). It is well-known that G = (Z",+) is a group. To 
show G is linearly-ordered, it suffices to show that 

V x,y,z £Z n : x <l y => x + z <l y + z . 

If x = y then surely x + z = y + z ==> x + z < L y + z. 

If instead, x <l y 1 then let k = vtm\ Xi<Vi i. Note that for all i with 1 < 
i < k, Xi = yi, and that xf. < yk- Consider x + z and y + z. Surely, for all 
1 < i < k, {x + z)i = (y + z)i, since x t = y t => x t + Zi = y, + z». Likewise, 
(x + z) k < (y + z) k , since x k <y k =^> x k + z k <y k + z k . 

Therefore, x < k y => x + z <lV + z. □ 
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